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Distances between power spectral densities* 

Tryphon T. Georgiou^ 



Abstract 



We present several natural notions of distance between spectral density functions of (discrete-time) random 



o 
o 



processes. They are motivated by certain filtering problems. First we quantify the degradation of performance of 
I a predictor which is designed for a particular spectral density function and then it is used to predict the values of 

a random process having a different spectral density. The logarithm of the ratio between the variance of the error, 
over the corresponding minimal (optimal) variance, produces a measure of distance between the two power spectra 
with several desirable properties. Analogous quantities based on smoothing problems produce alternative distances 
On ' and suggest a class of measures based on fractions of generalized means of ratios of power spectral densities. These 

distance measures endow the manifold of spectral density functions with a (pseudo) Riemannian metric. We pursue 
one of the possible options for a distance measure, characterize the relevant geodesies, and compute corresponding 
distances. 
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I. Introduction 



^ ■ ■ UE the centrality of spectral analysis in a wide range of scientific disciplines, there has been a variety 



D of viewpoints regarding how to quantify distances between spectral density functions. Besides the 
O obvious ones which are based on norms, inherited by ambient function-spaces L2, Li, etc., there has been 
^ a plethora of alternatives which attempt to acknowledge the structure of power spectral density functions 
\Q as a positive cone. The most most well known are the KuUback-Leibler divergence which originates in 
O hypothesis testing and in Bayes' estimation, the Itakura-Saito distance which originates in speech analysis 
^ — both belonging to Bregman class ([13], [4], [12]), the Bhattacharyya distance [2], and the Ali-Silvey 
c3 ■ class of divergences [1]. Their origin can be traced either to a probabilistic rationale (as in the case of 
^ ! the KuUback-Leibler divergence) or, to some ad-hoc mathematical construct designed to seek distance 
j> I measures with certain properties (as in the case of Bregman and Ali-Silvey classes). The purpose of this 
work is to introduce certain new notions of distance which are rooted in filtering theory and provide 
intrinsic distance measures between any two power spectral density functions. 

Our starting point is a prediction problem. We select an optimal predictive filter for an underlying 
random process based on the assumption that the process has a given power spectral density fi{0). We 
then evaluate the performance of such a filter against a second power spectral density f2{d) — which may 
be thought of as the spectral density function of the "actual" random process. The relative degradation 
of performance (i.e., variance of the prediction error) quantifies a mismatch between the two functions. 
Interestingly, it turns out to be equal to the ratio of the arithmetic over the geometric mean of the fraction 
of the two power spectra. The logarithm of the relative degradation serves a distance measure. 

Infinitesimal analysis suggests a pseudo-Riemannian metric on the manifold of power spectral density 
functions. The presence of such a metric suggests that geodesic distances may be used to quantify diver- 
gence between power spectra. Indeed, a characterization of geodesies is provided, and certain logarithmic 
intervals are shown to satisfy the condition. The length of such intervals connecting two power spectral 
densities provides yet another notion of distance between the two. 
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An identical approach based on the degradation of performance of smoothing filters leads to other 
expressions which, equally well, quantify divergence between power spectral densities. Two observations 
appear to be universal. First that the mismatch between the "shapes" of spectral density functions is what 
turns out to be important. This is quantified by how far the ratio /1//2 of the two spectral densities is from 
being constant across frequencies. The ratio of spectral density functions is reminiscent of the likelihood 
ratio in probability theory. 

The second observation is that all of the distance measures that we encountered, in essence, they 
compare different means (i.e., arithmetic, geometric, and harmonic, possibly, weighted) of the ratio of 
the two spectral density functions or of their logarithms. It is quite standard, that e.g., the argithmetic 
and the geometric means coincide only when the ratio is constant and have a gap otherwise. The same 
applies to a wider family of generalized means. Thus, this observation suggests a much larger class of 
possible alternatives: quantify the divergence between (the "shape" of) two density functions using the gap 
between two generalized means of their ratio, or by the slackness of Jensen-type of inequalities involving 
this ratio. The underlying mathematical construct appears quite distinct from those utilized in defining 
the Bregman and the Ali-Silvey classes of distance measures. Furthermore, the mathematical construct is 
deeply rooted in prediction theory and, at least in certain cases, can be motivated as quantifying degradation 
of performance as we explained earlier. 



IL Preliminaries on least- variance prediction and smoothing 

Consider a scalar zero-mean stationary random process {uk, k E Z} and denote by Rq, Ri, R2, ... its 
sequence of autocorrelation samples and by dfi{9) its power spectrum. Thus, Rk := ^'{n^n^^^.} = R*_i^ 
and 



R^ = — [ e^''^dii{e), for keZ, 
27r J^^ 



while £ denotes expectation and "*" denotes complex conjugation. We are interested in quadratic opti- 
mization problems with respect to the usual inner product 

(^akUk,'^biUi) := £{(^akUk)C^biUiy} 

k I k e 

= ^akRk-ebi- (1) 
k,e 

The closure of spanjufe : k E Z}, which we denote by U, can be identified with the space Iv2,d^[— vr, vr) 
of functions which are square integrable with respect to dii{9) with inner product 



27r „ 

where a{9) = Ylk^k^^^^ ^^'^ K^) ~ Ylii^^^^^^ ■ Further, the correspondence 



akc^"' 



k k 



is a Hilbert space isomorphism (see [14]). Thus, least-variance approximation problems can be equivalently 
expressed in L2,d^. In particular, the variance of the one-step-ahead prediction error uq — no|past for the 
predictor 

MOlpast = ^ OlkU_k 
k>0 



IS 



^{K-^oipastr} = ||l-5^«fce^''|li. (2) 



fc>0 
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Similarly, the variance of the error of the smoothing filter 

"^Olpast & future ■.= ^PkU-k (3) 
kj^O 

is simply 

^{I^^O - nOlpast& future ri = ||1 " J] ^'^^'^^H^M" 

kj^O 

In general, the power spectrum d/i is a bounded nonnegative measure on [— vr, vr) and admits a 
decomposition dji = djis + fdO with dfi^ a singular measure and fdO the absolutely continuous part 
of dfi (with respect to the Lebesgue measure). In general the singular part has no effect on the minimal 
variance of the error, and the corresponding component of Uk can be estimated with arbitrary accuracy 
using any "one-sided" infinite past. The variance of the optimal one-step-ahead prediction error depends 
only on the absolutely continuous part of the power spectrum and is given in terms by the celebrated 
Szego-Kolmogorov formula stated below (see [17] and also, [10, page 183], [18, Chapter 6], [11], [16]). 

Theorem 1: With d^i = dfis + fd6 as above 



inf 111 -J^c^ke'^'Wl = exp f \og f\e)de\ 



k>0 

when log / G Li[-7r,7r), and zero otherwise. 

In case log/ G 7r,7r) the prediction-error variance is nonzero and the random process is non- 
deterministic in the sense of Kolmogorov. In this case, it can be shown that 

/(^) = 1 — rWi2 

where af{z) is an outer function in the Hardy space if2(ID') with a/(0) = 1, i.e., 

af{z) = 1 + aiz + 02^^ + . . . 

is analytic in the unit disc © := {z : \z\ < 1} and its radial limits are square integrable (see [15]). Then, 
the linear combination 

Molpast := '^{-ak)u-k (5) 

fc>0 

serves as the optimal predictor of uq based on past observations and the least variance of the optimal 
prediction error becomes 



^"{1^0 - J^(-afc)M-fcP} 



fc>0 



lim;l r\a{re^')\'f{e)de 
r/\ It: 

exp{^ l\og{f{e))de 



=■ 9f- 

Analogous expressions exist for the optimal smoothing error and the corresponding smoothing filter 
which uses both past and future values of Ui. It is quite interesting, and rather straightforward, that while 
the variance of the optimal one-step-ahead prediction error is the geometric mean of the spectral density 
function, the variance of the error, when a smoothing filter utilizes both past and future, turns out to be 
the harmonic mean of the spectral density function. 
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Theorem 2: (see [9]) With = dfi^ + fd6 as above 



=: hf (6) 

when /"^ G Li[-7r,7r), and zero otherwise. 

In case /^^ G vr, vr) the variance of the optimal smoothing error is nonzero and the random process 
is nondeterministic in the sense that past and future specify the present which can be estimated with zero 
variance. In this case (see [9]) 

bf{e) = . . . + b2e-^^' + b.ie-^' + 1 + 
+ hie^' + h^e^'' + . . . 

is the image of the optimal smoothing error uq — Yl,k^o{~^k)uk under the Kolmogorov map, and that 



s{\u,~Y.^-h)u,\'] = ^£\bf{e)\'mde 



1 

= hf. 



h){f{e)r'f{d)dd 



III. Degradation of the prediction error variance 

We now consider two distinct spectral density functions /i, f-2 and postulate a situation where filtering 
of an underlying random process is attempted based on the incorrect choice between these two alternatives. 
The variance is then compared with the least possible variance which is achieved when the correct choice 
is made (i.e., when the predictor is optimal for the spectral density against which it is being evaluated). 
The degradation of performance is quantified by how much the ratio of the two prediction-error variances 
exceeds the identity. This ratio serves as a measure of mismatch between the two spectral densities (the 
one which was used to design the predictor and the one against which it is being evaluated). The resulting 
mismatch turns out to be scale-invariant — i.e., the expression is homogeneous. Hence, as a measure of 
distance it actually quantifies distance between the positive rays that the two spectral density functions 
define, and thus, it quantifies distance between the respective "shapes." It turns out that this distance is 
convex on logarithmic intervals and has a number of distance-like properties, short of being a metric. 

Let us assume that both log /i, log /2 G Li[— vr, tt) and hence, that 

for corresponding outer iJ2-functions a/. (2) normalized as before so that a/.(0) = 1, for i G {1,2}. 
Obviously, 



9f^ = expQ-J\og{Me))de^ 



denotes the geometric mean of fi as before, for ? G {1,2}. These expressions represent the least variances 
when the predictor is chosen on the basis of the correct spectral density function. If however, the predictor 
is based on /2 whereas the underlying process has /i as its spectral density, then the variance of the 
prediction error turns out to be 
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If we divide this variance by the optimal value gj^ we obtain 



' rf^(%o]^, I (8) 



This is the ratio of the arithmetic mean over the geometric mean of the fraction /1//2 of the two spectral 
density functions. The expression Pa/gifi, /2) is not symmetric in the two arguments. The subscript "a/g" 
signifies ratio of arithmetic over geometric means. 

The logarithm of Pa/g{fi, /2) is nonnegative and defines a notion of distance between rays of density 
functions. Henceforth, we denote this logarithm by 

5a/giflj2) := log(pa/9(/l,/2)) (9) 

Alternatively, we can view the above as slackness of a Jensen-type inequality. 

Before we discuss key properties of 6a/g, we introduce a natural class of paths connecting density 
functions: for any two density functions fa, fb, 

fr,ay-= fl-^ for TG [0,1], 

defines a logarithmic interval between and /t. The terminology stems from the fact that whenever the 
needed logarithms exist, 

fr,a,b = e(i--)i°s(^«)+^'°'^(^''), for r G [0, 1]. 

Later on we will see that these represent geodesies on the manifold of density functions with respect to 
an induced pseudo-Riemannian metric. 

Proposition 3: Let fi, i G {1,2,3} represent density functions defined on [-7r,7r). The following 
hold: 

(i) 6a/g{fij2)eR+u{oo}. 

(ii) 6a/g{fu /2) = ^ h{e)lh{e) is constant. 

(iii) Sa/gifi, fr,i,b) is monotonically increasing 
forr G [0,'l] and b g {2,3}. 

(iv) 6a/gUu /r,2,3) IS convex in r. 

Proof: Claims (i-ii) follow from the fact that the arithmetic mean of a function always exceeds the 
geometric mean, and that they are equal whenever a function is constant. In particular, the ordering, as 
to which is larger, follows from Jensen's inequality 



for any f{6) > 0. The fact that the two are equal only when f(9) is constant can be obtained easily using 
a variational argument. Then (i-ii) follow, when we substitute / = /1//2 and then take the logarithm. 
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Next we show (iv) and use it to deduce (iii). Since 

1 r . (hW)\ 



the derivative with respect to r becomes 



and the second derivative, 



dr'^ ' " ( rn f-,(e) ( Me) 



2 



r fisf^)'^^'^"-^"^^) ^ ^^^^^^ ^^'^^^^ 

But from Cauchy's inequality we have that 



d9 



^ m(m\ 



dO 



Hence, the second derivative is nonnegative. Claim (iv) is seen to hold true after we set a = 2 and 6 = 3. 
To establish claim (iii) set a = 1 and 6 = 2 in the above. Then, 

But 6{fi, ff'^^fi) = for r = 0, and 5{f\, f^^'^^f^) > for all r. Hence, the derivative at r = must 
be nonnegative and "^Vf) must increase as r y 1. This completes the proof of (iii), and the 

proof of the proposition. ■ 
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Since Sa/g{fi, f2) is not symmetric in its arguments, it is quite natural to consider the symmetrized 
version 

5(/l,/2) :=5a/,(/l,/2)+5a/,(/2,/l). 

All properties listed in Proposition |3] hold true for 5{fi, f2) as well. Furthermore, interestingly, 

which is the logarithm of the ratio of the arithmetic mean over the harmonic mean of the "likelihood" 
fraction fxj f^- Again, the distance of this ratio from one quantifies how far /1//2 is from being constant. 
We now summarizing the claimed properties of ■)• 
Proposition 4: Let fi, i e {1,2, 3}, density functions on [-vr, tt). The following hold: 

(i) 5(/i,/2)eM+U{oo}. 

(ii) /2) = ^ fi{0)/f2{e) is constant. 

(iii) fr,i,b) is monotonically increasing 
for r g'[6, 1] and b e {2,3}. 

(iv) ^ 2,3) is convex in r. 

Proof: Properties (i), (ii), and (iv) are a direct consequence of the corresponding properties given in 
Proposition |3] for Sa/g{-, ■)• Property (iii) on the other hand follows as before from (iv) and the fact that 
the derivative of 6{fi, fT,i,b) at r = is zero. ■ 



IV. An example 

In order to illustrate the quantitative behavior of these measures, we consider three specific power 
spectra labeled /i, /2, /a, as before. These are shown in Figure [T] We then consider the triangle formed 
with those power spectra as vertices and connected using logarithmic intervals. The interior of the triangle 
is similarly sampled at logarithmically placed points. In essence, we consider the family of power spectral 
densities 

^(1-.) /^(i-.)^.y for r,aG [0,1]. (10) 



For each value of r, a (sampled appropriately), we evaluate Sa/g{fi, /), /) and compare these to the 
KuUback-Leibler divergence between the suitably normalized functions 

hi(^)= Afj,^ ^ ke {1,2,3}. 

The normalization is necessary if the KuUback-Leibler divergence is to have properties of a distance 
measure (i.e., nonnegative when its arguments are different, etc.). Thus, we denote 



^kl(/i,/2) := ^ r/iWlog(y7^M^ 



/i(^) loglTTm) " 1 / /o\jn \d0 ] . (11) 



The set of power spectra in (flOb is thought of as a set of points forming an equilateral triangle, conceptually 
sitting on the xy-plane. Then, the vertical axis represents distance from /i, measured using these three 
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alternative measures. The corresponding surfaces are drawn in Figures EH The three power spectra used 
are as follows: 



f2{0) 



{z- .99) 



[z2 + ,Qz + .99) 
1 



[z2 _ .3z+ .99) 



2 



[z + .9){z'^ + .6^ + .99) 



[z2 + .9z + .99)(z2 + .92 + .99) 




Fig. 1. Three power spectral densities /i (-), /2 (- -), /a (--) 




There appears to be little qualitative difference between ■), Sa/g{-, ■), and Skl{-, ■)• They are also quite 
similar in that it is easy to calculate functional forms for minimizers of either of these distance measures 
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under moment constraints (see Section |V| below). Hence, it is important to undercore that 5kl(-, ■) lacks an 
intrinsic interpretation as a distance measure between power spectra, in contrast to 6{-, ■) which therefore 
may be preferable for exactly that reason. 

V. Functional form of minimizers in moment problems 

A large class of spectral analysis problems is typified by the trigonometric moment problem where a 
power spectral density is sought to match a partial sequence of autocorrelation samples, i.e., a positive 
function / is sought such that 

Rk = ^ re"''f{e)de, for A; G {0,1,..., n}, (12) 

see e.g., [10], [5], [11], [6], [12]. Since, in general, the family of consistent /'s is large, a particular one is 
chosen "closest" to a given "prior". Maximum entropy spectral analysis, for instance, can be interpreted 
as seeking the spectral density closest in the KuUback-Leibler sense to one which is flat, i.e., the prior in 
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this case is the power spectral density of white noise (see e.g., [12]). In the same spirit we may pose the 
problem of seeking / closest to /prior in the sense of minimizing e.g., 5a/g{f, f prior) and subject to the 
moment constraints dT^ . 

To this end, as usual, we introduce Lagrange multipliers (/c G {0, 1, . . . , n}) and form the Lagrangian 

C(J, A„, . . . , A„) log ^ £^.«-± £ ,og jj^/O^t^^ - i fffme) . 

Setting the variation of C identically to zero for all perturbations of / gives conditions that help identify 
the functional form of minimizing /'s. Briefly, 

5C = log— L-^d9-— Xogl^dO 

J —TT /prior J —-K /prior 

-log^ f ^de + ^ f \og-^de- A.f^ re^''A{9)d9) 

^7ry„^ /prior J-TT /prior \27r J _^ ) 



1 1 

t _ \^ \ jke ^ Q 

/pnor(^)2^/:j(^)//prior(^)rf^ M ^f- ' 



27r J—TT /prior k = ~n 

after we eliminate higher order terms. Stationarity conditions require that the above is identically zero for 
all (small) functions A. This leads to 

1 1 
from which we deduce that a minimizing / must be of the form 

f(n\ _ ^/prior(^) 

l-^Urior{e)EL^n>^ke^'' ^ ^ 

with 

1^ = ^ r f{e)/UrUd)de. (i4) 

Then, values for k as well as for the Lagrange multipliers must be determined so that / in ([T3t satisfies 
(fT2b and (fT4l) — this can be done for instance using homotopy methods in [7], [8]. It is interesting that when 
/prior = 1, the minimizer is the same as in the one obtained by applying the maximum entropy principle 
(e.g., see [12], [11]), i.e., it turns out to be an all-pole spectral density function which is of course 
uniquely identified by the moment constraints. Evidently, in general, minimizing 5a/g{f, fprior) gives a 
different answer than the one obtained by minimizing 5kl(/, /prior) or, by minimizing other distances. 
Yet, all such problems are similar and can be dealt with in two steps. First identify the functional form of 
a minimizer and then determine values for the coefficients so as to satisfy (fT2ll . The latter step requires 
solving a nonlinear problem in general, and can be approached in a variety of ways (e.g., as in [7], [8]). 



VI. RiEMANNIAN METRICS AND GEODESICS 

Infinitesimal perturbations about a given power spectral density function, when measured by any of 
Sa/gi', ■)' ^('i ■)' or ^kl(-, ■)' rise to nonnegative definite quadratic forms. These forms are in fact 
nonsingular on directions other than rays emanating from the origin. This is due to the fact that the 
aforementioned distances do not separate points on such rays while they give nonzero distance otherwise. 
They thus induce Riemannian metrics on suitably defined manifold of spectral rays. In this section (and 
the current paper) we focus on the particular metric induced by 5(-,-)' ^e show how to characterize 
geodesies, and verify that logarithmic intervals are in fact geodesies. 
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Throughout we assume that all functions are smooth enough so that the indicated integrals exist. This, 
in particular, can be ensured if all spectral density functions are bounded and have bounded inverses as 
well as bounded derivatives. Weaker conditions are clearly possible. For the purposes of this section we 
define 

JF := {/ : f(9) differentiable on [— 7r,7r], 

with f{6) > 0, and both f{6), -^-^ square integrable}. 
With a suitable norm on / and its derivative, JF becomes a (Banach) manifold. We also recall the definition 

(2^ Hj\fi.^)\dO] of the A;-th norm, applicable to any / on [0, 27r] provided the integral exist. 
Whenever, / is a density function, the absolute value is obsiously unnecessary. 
Proposition 5: Let /, / + A G J" where A is a perturbation such that | A//| < 1. Then 

W/./ + A) = i(i£(^<i.-(± ')+0(||A//||?). 

Here, 0(|| A//||f) indicates terms of order 3 or higher. 

Proof: We prove only the first claim. The other two can be shown in an identical manner. We expand 
^{fi / + ^) using the series log(l + x) = x — + — . . ., as follows 

which proves the first claim after canceling and collecting terms. The other two expressions can be shown 
similarly. Note that \\^/ f\\k < ||A//||i for all k, since |A//| < 1. ■ 
Since 5a/gi-,-), ^i',-), and 5kl(-,-) do not separate power spectra which are scalar multiple of one 
another, we may consider equivalence classes 

(/). := {/i e ^ : /i = c/, c e M+}. 

obtained from any spectral density function / by a scaling factor, constant across [— 7r,7r]. These can be 
thought of as rays. They can be identified by pointing to one particular representative. Thus, in particular, 
the set of rays can be identified with 

^:={f : feT, ||/||i = l}. 
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This set can be given the structure of a manifold and thought of as a set of probability density functions 
on [— TT, 7r]. Alternatively, we can consider spectral densities as belonging to T and accept the fact that we 
have a pseudo-Riemannian metric which vanishes along certain directions. This follows from Proposition 
|5l since ■) (and similarly, bajg and (5kl) defines a nonnegative definite quadratic form' at each "point" 
/ G ^ via 



1 

2^ 



1 

2^ 



de 



(15) 



A rather standard way to measure distances on manifold is to trace geodesies connecting points, and 
compute the length of such paths. Thus, it may be of interest to characterize geodesies in our case as 
well. We refrain from excessively technical jargon and, following the earlier suggestion, simply assume 
that all integrals exist. 

Consider a path r G [0, 1], of spectral density functions connecting two given ones, namely /q and 
/i. Note that fr is a function of two arguments, the path parameter r and the frequency 6* — hence, we 
often write /r(^). The length traversed as r varies from to 1 is simply 



i{fr:0,l] 



V^ifr, fr+dr) 



(16) 




Me) + Ue)dT 



de 



W) 



.fr{e) + fr{e)dT 



-de 



\ 



1 

2^ 



fr{0) 



de dr. 



(17) 



In the last step we eliminated higher order terms in dr inside the integral, since those integrate to zero. 
Here and throughout " ' " (dot), as in / is used to denote derivative with respect to r, i.e.. 



dr 



Interestingly, the expression in (flTt only depends on f^{e)/ fr{e). Thus, if we define 

Xr ■= log(/r), 

then Xrie) = fr{e)/U{e) and 



^(/. :0,1) := [ \l ^ (^r{e)f de - (J- J\r{e)de^ dr. (18) 

The requirement that the end point of f^- coincide with /q and /i, readlily translates into boundary 
conditions for Xr, namely xq = log/o and xi = log/i. The task of finding extremals of such integrals 
leads to Euler-Lagrange equations for the path x^-. More specifically, the Lagrangian corresponding to 
(HU) is 



L(Xr, Xr, t) :- 



i- l\xAe)f de - Q- 1^ xAe)de^ 



and only depends on Xr- Therefore dL/dx-r = and the Euler-Lagrange equations 

dL d dL 



dxr dr dxr 







If we mod-out scaling and stay on a manifold of "spectral rays", this quadratic form becomes positive definite, and defines a Riemannian 
metric. 
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simplify to dL/dXr being independent of r. Since Xr enters in L through an integral over 9, the partial 
derivative with respect to Xr is infinitesimal. Thus, we write 

which is independent of r as we just explained. Since 

dL 1 1 d f 1 r 9 / 1 '■^ ^ 



dxr 2 r I I 7~ I Z2 dxr \ 27r 

where the latter term produces again a differential in 6, it follows that 



^ y y Xr{e)de^ v{e)de = 2xr{e)-2(^-^ j Xr{e)de^de. 



I (±JB)?df)- I — I ±JB)df)^ 

271 

Alternatively, 



\j I r-Ai^r{e)f do - j^xMdo 



v{d), (19) 



which simply says that the variation of about the mean, as a function of 6, normalized by a "standard 
deviation"-like quantity must be independent of r. We summarize our conclusion as follows. 

Proposition 6: Given two spectral density functions /o, /i, extremal (geodesic) paths fr [t e [0, 1]) 
connecting the two, in the sense of achieving a local extremal of the path integral i{fr : 0, 1), must 
satisfy lO for Xr = log/^, i.e., the left hand side of (|T9j must be independent of r. 

Proof: The proof has been established in the arguments leading to the proposition. ■ 
We finally verify that logarithmic intervals satisfy ([T9b . This is rather straightforward since, for 

the logarithm 

Xr{9) = \og fA9) 

= log/o(^)+r(log/i(^)-log/o(^)) 
is a linear function of r and the derivative 

Xr{e) = ^log(/^(^)) 

= log(A(^))-log(/oW) 

= x,{0)-x2{e) 

is already independent of r. The ratio /1//0 plays a role analogous to the likelihood ratio of probability 
theory. The length of logarithmic intervals can be computed in terms of this ratio by simple inspection 
since from dlTl 

• friO) , ffliO) 

^t{0) = -TTTW = log 



fr{0) ^\fo{0), 

is independent of r. Therefore the following statement holds. 

Proposition 7: The length of the logarithmic path connecting two power spectral densities /o and 
/i, is given by 

Proof: The proof follows in the arguments leading to the proposition. ■ 
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VII. Degradation of the smoothing error variance 

In a way completely analogous to the previous sections we may consider the increase in the variance 
of the smoothing error when a wrong choice between two alternatives is used to identify a candidate 
smoothing filter. 

Thus, we begin with two density functions /i,/2 and assume that /f \ G 7r,7r). Accordingly 
we test the optimal smoothing filter based on /2 against /i. As explained in Section [ill the /2 -optimal 
smoothing filter gives rise to an error uq — no|past & future corresponding, via the Kolmogorov mapping, to 
hf^f2{9)^^. Hence, the variance of the smoothing error divided by the /i-optimal variance is 



Psmooth (/l,/2) 



^/ \hfj2{e) 



-1|2 



1 

1 

2^ 



Interestingly, this can be rewritten as follows 

Psmooth (/l,/2) = 



f2{9)-'fi{e)d9] ^ 



fiio)de ) — 



de 




f2{0) 



fi{o)-'de 



hie) 

m 

f2{0) 



d<pm 



X 



1 


f 1 r ff^w\ 

\2n J-n \h{9) J 


d<Pi{0)f 


I I ( h{d) 

Y 2-K J-n \f2{9) 


)'rf0iW 



where 



dct^M 



i_ r ( hid) 

2-K J-7T \MS) 

wr^dd 



dMO) 



(21) 



^T-j^iey^de 

is a normalized measure with variation one. Expression (1211) shows the degradation as the square of the 
ratio of the mean-square of the fraction /1//2 over its arithmetic mean. These two means, mean-square 
and arithmetic, are weighted by d(f)i which is of course dependent on one of the two arguments. However, 
the expression is homogeneous and does not depend on scaling of either of the two arguments fi or /2. 
Accordingly, we may define as a distance measure 

4mooth(/l, /2) = log(Psmooth(/l, /2)) 

2 




dMO) 




dMO) 



(22) 



The presence of a data-dependent integration measure may by compared to the (normalized) KuUback- 
Leibler divergence in (fTTT) . 
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VIII. Reappraisal and generalizations 

The expressions derived in the previous sections suggest that generalized means of the "likelihood"-like 
ratio A := /1//2 and their logarithms may be used as distance measures between "shapes" of density 
functions /i and /2. More specifically, we know that for any positive function A, 

Mr{A) < Ms{A) for any - 00 < r < s < 00, 

where Mf,(A) denotes the r-th generalized mean 

M.(A) := Q-J\(9Yd9^ ' . 

Then 

5,,,(A) := log(M,(A)) - log(M,(A)) > 

with a value which depends on how "far" A is from being constant. Hence, we may use 5r,s(/i//2) to 
quantify the distance between the "shapes" of /i and /2, and since 

Mo (A) := limM,(A) = e^^-^^^^^^^))"'^ 

is the geometric mean of A (see e.g., [3, page 23]), both 5, bajg that we encountered earlier are special 
cases of the above. Weighted versions of weighted means may also be used for the same purpose, as 
suggested in Section IVIII Lengths of geodesies as suggested in Section |Vll present another possibility. 
Indeed, a "zoo" of possible options emerges. Assessing practical and theoretical merits of each is the 
subject of a future project. 
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